Rule-based Topic Mining to Assist User-centered Visual Exploration of Document Collections

نویسندگان

  • ROBERTO PINHO
  • MARIA CRISTINA F. OLIVEIRA
  • ROSANE MINGHIM
  • ALNEU DE ANDRADE LOPES
  • RENATO RODRIGUES
چکیده

We propose a three step iterative and interactive visual text mining process to assist users in exploring document collections. In the proposed approach (i) topics are automatically extracted from a document collection , (ii) users explore a similarity-based document map and its related topics, while refining a topic list, and (iii) map quality itself and topic list definition can both be improved based on user interaction. A selective and sequential covering association rule induction strategy is employed to extract the topics. In this strategy, association rules are sequentially induced from selected (manually or automatically) groupings in the similarity-based document maps. Resulting topics are displayed on a Topic Tree control window that assists users in exploring the collection by (i) identifying documents related to specific topics in the map, (ii) removing uninteresting documents from the map, based on their topics, (iii) comparing related topics and documents, (iv) extracting new topics from user selected map regions or from the entire map, (v) building derived maps, and, (vi) eventually exporting sets of labeled documents. Derived maps inherit the previous topic definitions, while benefiting from the removal of undesired documents and, optionally, from the use of terms descriptive of relevant topics to compute document similarity. We present two case studies – on an online news corpus and on a collection of scientific papers – to illustrate our process and its suitability to explore document

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation

Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...

متن کامل

Feature extraction in opinion mining through Persian reviews

Opinion mining deals with an analysis of user reviews for extracting their opinions, sentiments and demands in a specific area, which can play an important role in making major decisions in such area. In general, opinion mining extracts user reviews at three levels of document, sentence and feature. Opinion mining at the feature level is taken into consideration more than the other two levels d...

متن کامل

Visualization Techniques to Explore Data Mining Results for Document Collections

Data mining has been informally introduced as large scale search for interesting patterns in data. It is often an explorative task iteratively performed within the process of knowledge discovery in databases. In this process, interactive visualization techniques are also successfully applied for data exploration. We deal in this paper with the synergy of these two complemental approaches. Where...

متن کامل

Visualization and Clustering of Document Collections using a Flock-based Swarm Intelligence Technique

Electronic availability of documents continues to increase, yet identifying documents relevant to the user remains a primary constraint in electronic document use. Visual representations of document collections can facilitate search by representing large collections of documents in a manner that is complementary to linear, text based representations. Visual representations can provide a means t...

متن کامل

TopicViz: Semantic Navigation of Document Collections

When people explore and manage information, they think in terms of topics and themes. However, the software that supports information exploration sees text at only the surface level. In this paper we show how topic modeling – a technique for identifying latent themes across large collections of documents – can support semantic exploration. We present TopicViz, an interactive environment for inf...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009